Overview

Dataset statistics

Number of variables33
Number of observations2080
Missing cells11772
Missing cells (%)17.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory536.4 KiB
Average record size in memory264.1 B

Variable types

CAT22
NUM10
UNSUPPORTED1

Reproduction

Analysis started2020-06-27 02:28:53.843135
Analysis finished2020-06-27 02:29:10.150339
Duration16.31 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

atty_firm_name has a high cardinality: 195 distinct values High cardinality
detail_cause has a high cardinality: 56 distinct values High cardinality
how_injury_occur has a high cardinality: 2077 distinct values High cardinality
injury_city has a high cardinality: 392 distinct values High cardinality
injury_postal has a high cardinality: 550 distinct values High cardinality
detail_cause is highly correlated with causeHigh correlation
cause is highly correlated with detail_causeHigh correlation
osha_injury_type is highly correlated with nature_injuryHigh correlation
nature_injury is highly correlated with osha_injury_type and 1 other fieldsHigh correlation
type_loss is highly correlated with nature_injuryHigh correlation
Dependent has 2080 (100.0%) missing values Missing
ave_wkly_wage has 1304 (62.7%) missing values Missing
claimant_age has 308 (14.8%) missing values Missing
atty_firm_name has 1695 (81.5%) missing values Missing
marital_status has 1742 (83.8%) missing values Missing
depart_code has 1095 (52.6%) missing values Missing
injury_postal has 513 (24.7%) missing values Missing
#dependents has 2016 (96.9%) missing values Missing
severity_index has 41 (2.0%) missing values Missing
reforms_dummy has 837 (40.2%) missing values Missing
length_employed has 109 (5.2%) missing values Missing
how_injury_occur is uniformly distributed Uniform
Dependent is an unsupported type, check if it needs cleaning or further analysis Unsupported
time_injury has 368 (17.7%) zeros Zeros
diff_carrier_employer has 473 (22.7%) zeros Zeros
diff_employer_injury has 1506 (72.4%) zeros Zeros

Variables

Dependent
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing2080
Missing (%)100.0%
Memory size16.4 KiB

ave_wkly_wage
Real number (ℝ≥0)

MISSING

Distinct count464
Unique (%)59.8%
Missing1304
Missing (%)62.7%
Infinite0
Infinite (%)0.0%
Mean1138.7731958762886
Minimum2.0
Maximum9000.0
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum2
5-th percentile150
Q1500
median960
Q31503.5
95-th percentile2624.25
Maximum9000
Range8998
Interquartile range (IQR)1003.5

Descriptive statistics

Standard deviation995.9269763
Coefficient of variation (CV)0.8745613085
Kurtosis14.52735374
Mean1138.773196
Median Absolute Deviation (MAD)498.5
Skewness2.94457534
Sum883688
Variance991870.542
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
500472.3%
 
1000291.4%
 
320251.2%
 
600211.0%
 
150201.0%
 
1500160.8%
 
100160.8%
 
400160.8%
 
800100.5%
 
30090.4%
 
Other values (454)56727.3%
 
(Missing)130462.7%
 
ValueCountFrequency (%) 
220.1%
 
61< 0.1%
 
101< 0.1%
 
161< 0.1%
 
191< 0.1%
 
ValueCountFrequency (%) 
90001< 0.1%
 
83041< 0.1%
 
71811< 0.1%
 
71421< 0.1%
 
600020.1%
 

body_part
Categorical

Distinct count44
Unique (%)2.1%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Finger(s)
 
173
Knee
 
168
Low Back Area
 
162
Other Facial Soft Tissue
 
158
Ankle
 
142
Other values (39)
1277
ValueCountFrequency (%) 
Finger(s)1738.3%
 
Knee1688.1%
 
Low Back Area1627.8%
 
Other Facial Soft Tissue1587.6%
 
Ankle1426.8%
 
Eye(s)1366.5%
 
Hand1185.7%
 
Lower Leg874.2%
 
Shoulder(s)844.0%
 
Wrist793.8%
 
Other values (34)77337.2%
 

Length

Max length54
Median length6
Mean length10.40673077
Min length3

cause
Categorical

HIGH CORRELATION

Distinct count10
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Strain or Injury By
553
Fall, Slip or Trip Injury
368
Struck or Injured By
343
Cut, Puncture, Scrape Injured By
255
Miscellaneous Causes
251
Other values (5)
310
ValueCountFrequency (%) 
Strain or Injury By55326.6%
 
Fall, Slip or Trip Injury36817.7%
 
Struck or Injured By34316.5%
 
Cut, Puncture, Scrape Injured By25512.3%
 
Miscellaneous Causes25112.1%
 
Striking Against or Stepping on1436.9%
 
Caught In, Under or Between854.1%
 
Motor Vehicle532.5%
 
Burn or Scald - Heat or Cold Exposure281.3%
 
Rubbed or Abraded By1< 0.1%
 

Length

Max length37
Median length20
Mean length23.18269231
Min length13

claimant_age
Real number (ℝ≥0)

MISSING

Distinct count62
Unique (%)3.5%
Missing308
Missing (%)14.8%
Infinite0
Infinite (%)0.0%
Mean40.090857787810386
Minimum1.0
Maximum88.0
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum1
5-th percentile22.55
Q130
median40
Q349
95-th percentile60
Maximum88
Range87
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.0022201
Coefficient of variation (CV)0.2993754877
Kurtosis-0.4956951031
Mean40.09085779
Median Absolute Deviation (MAD)9
Skewness0.2236443653
Sum71041
Variance144.0532873
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
39602.9%
 
41572.7%
 
30552.6%
 
46532.5%
 
38532.5%
 
42532.5%
 
27522.5%
 
40522.5%
 
25522.5%
 
26502.4%
 
Other values (52)123559.4%
 
(Missing)30814.8%
 
ValueCountFrequency (%) 
120.1%
 
1020.1%
 
161< 0.1%
 
1740.2%
 
1890.4%
 
ValueCountFrequency (%) 
881< 0.1%
 
771< 0.1%
 
7420.1%
 
7220.1%
 
7120.1%
 

atty_firm_name
Categorical

HIGH CARDINALITY
MISSING

Distinct count195
Unique (%)50.6%
Missing1695
Missing (%)81.5%
Memory size16.2 KiB
TLEVY, STERN & FORD
 
10
TJ. LEEDS BARROL, IV ATTORNEY AT LAW
 
8
TSHERMAN, FEDERMAN SAMBUR & LEVINE
 
7
TLEVY, FORD & WALLACH
 
6
TGORDON, EDELSTEIN, KREPACK, ET AL
 
6
Other values (190)
348
ValueCountFrequency (%) 
TLEVY, STERN & FORD100.5%
 
TJ. LEEDS BARROL, IV ATTORNEY AT LAW80.4%
 
TSHERMAN, FEDERMAN SAMBUR & LEVINE70.3%
 
TLEVY, FORD & WALLACH60.3%
 
TGORDON, EDELSTEIN, KREPACK, ET AL60.3%
 
TLAW OFFICE OF CHRISTINE T. NELSON60.3%
 
TSCHUMMER, ROLBIN, & HURST60.3%
 
TTHE KLEIN LAW GROUP P.C.50.2%
 
TRANIERI & NEWMAN50.2%
 
TKUSION & CAMPANA50.2%
 
Other values (185)32115.4%
 
(Missing)169581.5%
 

Length

Max length57
Median length3
Mean length7.348557692
Min length3

gender
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Male
1661
Female
411
Uknown
 
8
ValueCountFrequency (%) 
Male166179.9%
 
Female41119.8%
 
Uknown80.4%
 

Length

Max length6
Median length4
Mean length4.402884615
Min length4

marital_status
Categorical

MISSING

Distinct count3
Unique (%)0.9%
Missing1742
Missing (%)83.8%
Memory size16.2 KiB
Unmarried, Single, Widowed, Divorced
187
Married
149
Separated
 
2
ValueCountFrequency (%) 
Unmarried, Single, Widowed, Divorced1879.0%
 
Married1497.2%
 
Separated20.1%
 
(Missing)174283.8%
 

Length

Max length36
Median length3
Mean length6.259134615
Min length3

claim_st
Categorical

Distinct count44
Unique (%)2.1%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
California
1186
New York
 
130
New Mexico
 
85
Georgia
 
80
North Carolina
 
79
Other values (39)
520
ValueCountFrequency (%) 
California118657.0%
 
New York1306.2%
 
New Mexico854.1%
 
Georgia803.8%
 
North Carolina793.8%
 
Texas713.4%
 
Louisiana623.0%
 
New Jersey532.5%
 
Illinois321.5%
 
Pennsylvania251.2%
 
Other values (34)27713.3%
 

Length

Max length15
Median length10
Mean length9.4625
Min length4

depart_code
Real number (ℝ≥0)

MISSING

Distinct count21
Unique (%)2.1%
Missing1095
Missing (%)52.6%
Infinite0
Infinite (%)0.0%
Mean12.22741116751269
Minimum1.0
Maximum23.0
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum1
5-th percentile2
Q16
median14
Q318
95-th percentile21
Maximum23
Range22
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.880886977
Coefficient of variation (CV)0.5627427493
Kurtosis-1.438882831
Mean12.22741117
Median Absolute Deviation (MAD)6
Skewness-0.1487988687
Sum12044
Variance47.34660559
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
211567.5%
 
81296.2%
 
171286.2%
 
3904.3%
 
6743.6%
 
18643.1%
 
2623.0%
 
14623.0%
 
11592.8%
 
1301.4%
 
Other values (11)1316.3%
 
(Missing)109552.6%
 
ValueCountFrequency (%) 
1301.4%
 
2623.0%
 
3904.3%
 
4110.5%
 
5120.6%
 
ValueCountFrequency (%) 
2350.2%
 
22160.8%
 
211567.5%
 
20201.0%
 
19211.0%
 

detail_cause
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct count56
Unique (%)2.7%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Strain/Injury by Misc
 
204
Strain/Injury by Lifting
 
151
Struck by Falling/Flying Object
 
130
Fall/Slip, Same Level
 
127
Misc, Foreign Body in Eye
 
114
Other values (51)
1354
ValueCountFrequency (%) 
Strain/Injury by Misc2049.8%
 
Strain/Injury by Lifting1517.3%
 
Struck by Falling/Flying Object1306.2%
 
Fall/Slip, Same Level1276.1%
 
Misc, Foreign Body in Eye1145.5%
 
Strike/Step On, Fixed Object1105.3%
 
Cut/Puncture/Scrape, Object Lift/Handled1085.2%
 
Misc, Other1045.0%
 
Fall/Slip, Misc874.2%
 
Cut/Puncture/Scrape, Hand Tool673.2%
 
Other values (46)87842.2%
 

Length

Max length40
Median length25
Mean length25.66538462
Min length9

domestic_foreign
Categorical

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Domestic
2061
Foreign
 
19
ValueCountFrequency (%) 
Domestic206199.1%
 
Foreign190.9%
 

Length

Max length8
Median length8
Mean length7.990865385
Min length7

employ_status
Categorical

Distinct count8
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Unknown/Other
1728
Full-Time
 
307
Seasonal
 
20
Part-Time
 
12
Piece Worker
 
7
Other values (3)
 
6
ValueCountFrequency (%) 
Unknown/Other172883.1%
 
Full-Time30714.8%
 
Seasonal201.0%
 
Part-Time120.6%
 
Piece Worker70.3%
 
On Strike30.1%
 
Disabled20.1%
 
Retired1< 0.1%
 

Length

Max length13
Median length13
Mean length12.32163462
Min length7

handling_office
Categorical

Distinct count28
Unique (%)1.3%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
LOS ANGELE
975
SACRAMENTO
263
DALLAS WC
 
177
NEW JERSEY
 
108
CHARLOTTE
 
78
Other values (23)
479
ValueCountFrequency (%) 
LOS ANGELE97546.9%
 
SACRAMENTO26312.6%
 
DALLAS WC1778.5%
 
NEW JERSEY1085.2%
 
CHARLOTTE783.8%
 
LONG ISLAN592.8%
 
IN-STATE A572.7%
 
WC SOUTHEA562.7%
 
ATLANTA512.5%
 
ILLINOIS422.0%
 
Other values (18)21410.3%
 

Length

Max length10
Median length10
Mean length9.675961538
Min length6

how_injury_occur
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count2077
Unique (%)99.9%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
UNKNOWN
 
2
WHILE WORKING IN A WOODED - TALL GRASS AREA - EE WAS BITTEN
 
2
EMPLOYEE WAS WORKING ON SET WHEN HE SUSTAINED AN INSECT BITE
 
2
HE WAS WORKING ON HIS HANDS AND KNEES PUTTING LAYOUT PAPER D
 
1
TRYING TO MOVE EQUIPMENT ON THE TRUCK AND WAS STRUCK IN THE
 
1
Other values (2072)
2072
ValueCountFrequency (%) 
UNKNOWN20.1%
 
WHILE WORKING IN A WOODED - TALL GRASS AREA - EE WAS BITTEN20.1%
 
EMPLOYEE WAS WORKING ON SET WHEN HE SUSTAINED AN INSECT BITE20.1%
 
HE WAS WORKING ON HIS HANDS AND KNEES PUTTING LAYOUT PAPER D1< 0.1%
 
TRYING TO MOVE EQUIPMENT ON THE TRUCK AND WAS STRUCK IN THE1< 0.1%
 
WHILE PULLING THE SAFETY TENSION LINE EE FELT A POP IN HIS R1< 0.1%
 
WHILE EMPLOYEE CARRYING EQUIPMENT ACROSS A DARK PARKING LOT1< 0.1%
 
EMPLOYEE WAS MOVING TOOL BOX FROM STAGE TO LOCK UP WHEN IT R1< 0.1%
 
WHILE EE ON SET LOCATION UNLOADING PROPERTY EQUIPMENT HE STR1< 0.1%
 
ON A RAINY DAY - EMPLOYEE WAS WALKING INTO A BUILDING (SET)1< 0.1%
 
Other values (2067)206799.4%
 

Length

Max length60
Median length60
Mean length57.67163462
Min length7

injury_city
Categorical

HIGH CARDINALITY

Distinct count392
Unique (%)18.8%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
LOS ANGELES
308
BURBANK
 
215
UNKNOWN
 
207
NEW ORLEANS
 
97
NEW YORK
 
51
Other values (387)
1202
ValueCountFrequency (%) 
LOS ANGELES30814.8%
 
BURBANK21510.3%
 
UNKNOWN20710.0%
 
NEW ORLEANS974.7%
 
NEW YORK512.5%
 
BROOKLYN482.3%
 
WILMINGTON422.0%
 
CULVER CITY381.8%
 
ATLANTA281.3%
 
AUSTIN271.3%
 
Other values (382)101949.0%
 

Length

Max length19
Median length8
Mean length9.078846154
Min length2

injury_postal
Categorical

HIGH CARDINALITY
MISSING

Distinct count550
Unique (%)35.1%
Missing513
Missing (%)24.7%
Memory size16.2 KiB
91502
 
194
95816
 
55
90038
 
38
91505
 
29
90028
 
26
Other values (545)
1225
ValueCountFrequency (%) 
915021949.3%
 
95816552.6%
 
90038381.8%
 
91505291.4%
 
90028261.2%
 
91608251.2%
 
90001241.2%
 
91504241.2%
 
90232211.0%
 
90009201.0%
 
Other values (540)111153.4%
 
(Missing)51324.7%
 

Length

Max length9
Median length5
Mean length4.495192308
Min length3

injury_state
Categorical

Distinct count37
Unique (%)1.8%
Missing1
Missing (%)< 0.1%
Memory size16.2 KiB
California
1092
Louisiana
 
174
New York
 
171
Georgia
 
93
North Carolina
 
75
Other values (32)
474
ValueCountFrequency (%) 
California109252.5%
 
Louisiana1748.4%
 
New York1718.2%
 
Georgia934.5%
 
North Carolina753.6%
 
Texas522.5%
 
Hawaii482.3%
 
Michigan432.1%
 
New Mexico432.1%
 
Illinois351.7%
 
Other values (27)25312.2%
 

Length

Max length20
Median length10
Mean length9.455769231
Min length3

jurisdiction
Categorical

Distinct count34
Unique (%)1.6%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
California
1243
New York
 
194
Louisiana
 
130
Georgia
 
81
North Carolina
 
77
Other values (29)
355
ValueCountFrequency (%) 
California124359.8%
 
New York1949.3%
 
Louisiana1306.2%
 
Georgia813.9%
 
North Carolina773.7%
 
Texas422.0%
 
Illinois371.8%
 
New Mexico341.6%
 
Hawaii321.5%
 
Pennsylvania311.5%
 
Other values (24)1798.6%
 

Length

Max length20
Median length10
Mean length9.515384615
Min length4
Distinct count2
Unique (%)0.1%
Missing1
Missing (%)< 0.1%
Memory size16.2 KiB
Medical Only
1581
Lost Time
498
ValueCountFrequency (%) 
Medical Only158176.0%
 
Lost Time49823.9%
 
(Missing)1< 0.1%
 

Length

Max length12
Median length12
Mean length11.27740385
Min length3

nature_injury
Categorical

HIGH CORRELATION

Distinct count36
Unique (%)1.7%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Strain
477
Laceration
341
Specific Injury - All Other
309
Contusion
234
Sprain
144
Other values (31)
575
ValueCountFrequency (%) 
Strain47722.9%
 
Laceration34116.4%
 
Specific Injury - All Other30914.9%
 
Contusion23411.2%
 
Sprain1446.9%
 
Foreign Body1024.9%
 
Puncture944.5%
 
Fracture823.9%
 
Inflammation773.7%
 
Infection311.5%
 
Other values (26)1899.1%
 

Length

Max length59
Median length9
Mean length11.71009615
Min length4

#dependents
Real number (ℝ≥0)

MISSING

Distinct count5
Unique (%)7.8%
Missing2016
Missing (%)96.9%
Infinite0
Infinite (%)0.0%
Mean1.953125
Minimum1.0
Maximum5.0
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile4
Maximum5
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.075405977
Coefficient of variation (CV)0.5506078603
Kurtosis0.4923951058
Mean1.953125
Median Absolute Deviation (MAD)1
Skewness1.044102286
Sum125
Variance1.156498016
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1281.3%
 
2190.9%
 
3110.5%
 
440.2%
 
520.1%
 
(Missing)201696.9%
 
ValueCountFrequency (%) 
1281.3%
 
2190.9%
 
3110.5%
 
440.2%
 
520.1%
 
ValueCountFrequency (%) 
520.1%
 
440.2%
 
3110.5%
 
2190.9%
 
1281.3%
 

osha_injury_type
Categorical

HIGH CORRELATION

Distinct count6
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
Injury
2049
Skin disorder
 
20
All other illnesses
 
3
Hearing loss
 
3
Respiratory condition
 
3
ValueCountFrequency (%) 
Injury204998.5%
 
Skin disorder201.0%
 
All other illnesses30.1%
 
Hearing loss30.1%
 
Respiratory condition30.1%
 
Poisoning20.1%
 

Length

Max length21
Median length6
Mean length6.119230769
Min length6

severity_index
Categorical

MISSING

Distinct count7
Unique (%)0.3%
Missing41
Missing (%)2.0%
Memory size16.2 KiB
No Serious Injury Indicated
2026
Back Injury involving Surgery/Extended Disability
 
7
Fractured Bone(s)
 
2
Involves AIDS, Herpes, TSS, Cancer, Other Diseases
 
1
Heart Attack or Cardio-Vascular Accident
 
1
Other values (2)
 
2
ValueCountFrequency (%) 
No Serious Injury Indicated202697.4%
 
Back Injury involving Surgery/Extended Disability70.3%
 
Fractured Bone(s)20.1%
 
Involves AIDS, Herpes, TSS, Cancer, Other Diseases1< 0.1%
 
Heart Attack or Cardio-Vascular Accident1< 0.1%
 
Serious Burns1< 0.1%
 
Fatality1< 0.1%
 
(Missing)412.0%
 

Length

Max length50
Median length27
Mean length26.59278846
Min length3

time_injury
Real number (ℝ≥0)

ZEROS

Distinct count252
Unique (%)12.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean988.2278846153846
Minimum0
Maximum2358
Zeros368
Zeros (%)17.7%
Memory size16.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1600
median1030
Q31500
95-th percentile2000
Maximum2358
Range2358
Interquartile range (IQR)900

Descriptive statistics

Standard deviation649.793849
Coefficient of variation (CV)0.65753442
Kurtosis-0.9322108012
Mean988.2278846
Median Absolute Deviation (MAD)470
Skewness-0.157018299
Sum2055514
Variance422232.0462
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
036817.7%
 
1100823.9%
 
1000793.8%
 
1500743.6%
 
800612.9%
 
1400612.9%
 
1600532.5%
 
830472.3%
 
1030462.2%
 
900462.2%
 
Other values (242)116355.9%
 
ValueCountFrequency (%) 
036817.7%
 
51< 0.1%
 
1050.2%
 
141< 0.1%
 
1590.4%
 
ValueCountFrequency (%) 
23581< 0.1%
 
234530.1%
 
233030.1%
 
23151< 0.1%
 
2300160.8%
 

type_loss
Categorical

HIGH CORRELATION

Distinct count2
Unique (%)0.1%
Missing4
Missing (%)0.2%
Memory size16.2 KiB
Specific Injury
2058
Cumulative Trauma
 
18
ValueCountFrequency (%) 
Specific Injury205898.9%
 
Cumulative Trauma180.9%
 
(Missing)40.2%
 

Length

Max length17
Median length15
Mean length14.99423077
Min length3

policy_yr
Real number (ℝ≥0)

Distinct count15
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2007.5461538461539
Minimum2000
Maximum2014
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum2000
5-th percentile2001
Q12004
median2008
Q32011
95-th percentile2014
Maximum2014
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.240234755
Coefficient of variation (CV)0.00211214808
Kurtosis-1.207004565
Mean2007.546154
Median Absolute Deviation (MAD)4
Skewness-0.1583360318
Sum4175696
Variance17.97959078
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20111919.2%
 
20131868.9%
 
20121637.8%
 
20101537.4%
 
20051527.3%
 
20041507.2%
 
20071326.3%
 
20081316.3%
 
20091296.2%
 
20141296.2%
 
Other values (5)56427.1%
 
ValueCountFrequency (%) 
20001024.9%
 
20011014.9%
 
20021256.0%
 
20031256.0%
 
20041507.2%
 
ValueCountFrequency (%) 
20141296.2%
 
20131868.9%
 
20121637.8%
 
20111919.2%
 
20101537.4%
 

reforms_dummy
Categorical

MISSING

Distinct count3
Unique (%)0.2%
Missing837
Missing (%)40.2%
Memory size16.2 KiB
California Refom 1
661
California Refom 0
434
California Reform 2
148
ValueCountFrequency (%) 
California Refom 166131.8%
 
California Refom 043420.9%
 
California Reform 21487.1%
 
(Missing)83740.2%
 

Length

Max length19
Median length18
Mean length12.03509615
Min length3

length_employed
Real number (ℝ≥0)

MISSING

Distinct count25
Unique (%)1.3%
Missing109
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean7.789954337899544
Minimum1.0
Maximum49.0
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median7
Q311
95-th percentile15
Maximum49
Range48
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.659635565
Coefficient of variation (CV)0.598159548
Kurtosis6.129521989
Mean7.789954338
Median Absolute Deviation (MAD)4
Skewness1.086851841
Sum15354
Variance21.7122036
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
41838.8%
 
21828.8%
 
31457.0%
 
51457.0%
 
101426.8%
 
111426.8%
 
121356.5%
 
131336.4%
 
61256.0%
 
71246.0%
 
Other values (15)51524.8%
 
ValueCountFrequency (%) 
1964.6%
 
21828.8%
 
31457.0%
 
41838.8%
 
51457.0%
 
ValueCountFrequency (%) 
491< 0.1%
 
461< 0.1%
 
431< 0.1%
 
331< 0.1%
 
271< 0.1%
 

diff_carrier_employer
Real number (ℝ)

ZEROS

Distinct count92
Unique (%)4.5%
Missing13
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean6.8916303821964195
Minimum-79.0
Maximum496.0
Zeros473
Zeros (%)22.7%
Memory size16.2 KiB

Quantile statistics

Minimum-79
5-th percentile0
Q11
median2
Q35
95-th percentile26.7
Maximum496
Range575
Interquartile range (IQR)4

Descriptive statistics

Standard deviation23.55366805
Coefficient of variation (CV)3.417720734
Kurtosis146.5467193
Mean6.891630382
Median Absolute Deviation (MAD)2
Skewness10.2292231
Sum14245
Variance554.7752784
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
147422.8%
 
047322.7%
 
32069.9%
 
21959.4%
 
41356.5%
 
51165.6%
 
6844.0%
 
7582.8%
 
8331.6%
 
9251.2%
 
Other values (82)26812.9%
 
ValueCountFrequency (%) 
-791< 0.1%
 
-170.3%
 
047322.7%
 
147422.8%
 
21959.4%
 
ValueCountFrequency (%) 
4961< 0.1%
 
3451< 0.1%
 
2641< 0.1%
 
2531< 0.1%
 
2521< 0.1%
 

diff_employer_injury
Real number (ℝ≥0)

ZEROS

Distinct count97
Unique (%)4.7%
Missing13
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean15.796323173681664
Minimum0.0
Maximum1958.0
Zeros1506
Zeros (%)72.4%
Memory size16.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile19
Maximum1958
Range1958
Interquartile range (IQR)1

Descriptive statistics

Standard deviation117.3032394
Coefficient of variation (CV)7.425983764
Kurtosis136.9655436
Mean15.79632317
Median Absolute Deviation (MAD)0
Skewness10.9244156
Sum32651
Variance13760.04998
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0150672.4%
 
11768.5%
 
2703.4%
 
3542.6%
 
4432.1%
 
6231.1%
 
5180.9%
 
7140.7%
 
1190.4%
 
1080.4%
 
Other values (87)1467.0%
 
(Missing)130.6%
 
ValueCountFrequency (%) 
0150672.4%
 
11768.5%
 
2703.4%
 
3542.6%
 
4432.1%
 
ValueCountFrequency (%) 
19581< 0.1%
 
17821< 0.1%
 
17631< 0.1%
 
16051< 0.1%
 
14031< 0.1%
 

shift
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.2 KiB
2nd
945
1st
854
3rd
281
ValueCountFrequency (%) 
2nd94545.4%
 
1st85441.1%
 
3rd28113.5%
 

Length

Max length3
Median length3
Mean length3
Min length3

length_how_injury
Real number (ℝ≥0)

Distinct count41
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.67163461538462
Minimum7
Maximum60
Zeros0
Zeros (%)0.0%
Memory size16.2 KiB

Quantile statistics

Minimum7
5-th percentile45
Q159
median60
Q360
95-th percentile60
Maximum60
Range53
Interquartile range (IQR)1

Descriptive statistics

Standard deviation6.020436833
Coefficient of variation (CV)0.1043916455
Kurtosis15.75156812
Mean57.67163462
Median Absolute Deviation (MAD)0
Skewness-3.680373736
Sum119957
Variance36.24565966
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60135765.2%
 
5934916.8%
 
58341.6%
 
55281.3%
 
54261.2%
 
53251.2%
 
57221.1%
 
51221.1%
 
56211.0%
 
47180.9%
 
Other values (31)1788.6%
 
ValueCountFrequency (%) 
720.1%
 
171< 0.1%
 
201< 0.1%
 
211< 0.1%
 
2220.1%
 
ValueCountFrequency (%) 
60135765.2%
 
5934916.8%
 
58341.6%
 
57221.1%
 
56211.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Dependentave_wkly_wagebody_partcauseclaimant_ageatty_firm_namegendermarital_statusclaim_stdepart_codedetail_causedomestic_foreignemploy_statushandling_officehow_injury_occurinjury_cityinjury_postalinjury_statejurisdictionlost_time_or_medicalonlynature_injury#dependentsosha_injury_typeseverity_indextime_injurytype_losspolicy_yrreforms_dummylength_employeddiff_carrier_employerdiff_employer_injuryshiftlength_how_injury
0NaNNaNWristStrain or Injury By37.0NaNMaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticUnknown/OtherLOS ANGELEEE CLAIMS: CT 1991 -4/10/01 TO BILATERAL HANDS - WRISTS ANDBURBANK91502CaliforniaCaliforniaLost TimeCarpal Tunnel SyndromeNaNInjuryNo Serious Injury Indicated0Cumulative Trauma2001California Refom 115.052.01403.01st60
1NaN465.0Multiple Lower ExtremitiesStrain or Injury By45.0NaNMaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticFull-TimeLOS ANGELEEE CLAIMS : CT 1990 -04 -01 -03 TO LOW BACK (L) SIDE (L) LEGBURBANK91502CaliforniaCaliforniaLost TimeStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2003California Refom 119.08.0911.01st60
2NaN237.0KneeStrain or Injury By38.0NaNMaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticUnknown/OtherLOS ANGELEEMPLOYEE CLAIMS: CT 12/6/01 - 12/6/02 TO LEFT KNEE FROM REPEBURBANK91502CaliforniaCaliforniaLost TimeStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2002California Refom 113.00.01258.01st60
3NaN1378.0Multiple Body PartsFall, Slip or Trip InjuryNaNNaNMaleNaNCaliforniaNaNFall/Slip, Different LevelDomesticFull-TimeLOS ANGELEWALKING ON PLYWOOD RAMP, RAMP SLIPPED AND HE FELL ON R SIDE.LAS VEGAS89101NevadaCaliforniaLost TimeMultiple Injury, Physical OnlyNaNInjuryNo Serious Injury Indicated730Specific Injury2001California Refom 014.01.00.01st60
4NaN888.0Multiple Body PartsMiscellaneous Causes26.0NaNMaleNaNCaliforniaNaNMisc, OtherDomesticFull-TimeSACRAMENTOEE ALLEGES R SCAPULA INJURIES OF UNKNOWN CAUSEUNKNOWN91502CaliforniaCaliforniaMedical OnlyMultiple Injury, Physical OnlyNaNInjuryNo Serious Injury Indicated0Specific Injury2001California Refom 014.026.016.01st46
5NaN1300.0Other Facial Soft TissueStruck or Injured ByNaNNaNMaleNaNCaliforniaNaNStruck by Object Lifted/HandledDomesticFull-TimeSACRAMENTOFOLDING A CLOTHING RACK, IT COLLAPSED AND STRUCK EES HEADLAKE ARROWHEAD92352CaliforniaCaliforniaLost TimeContusionNaNInjuryNo Serious Injury Indicated1030Specific Injury2001California Refom 015.018.00.02nd57
6NaNNaNHandCaught In, Under or BetweenNaNNaNMaleNaNCaliforniaNaNCaught In/Between, MachineryDomesticFull-TimeSACRAMENTOELECTRIC WORK CART SLAMMED ON MY HANDVALENCIA91355CaliforniaCaliforniaMedical OnlyCrushingNaNInjuryNo Serious Injury Indicated1150Specific Injury2001California Refom 0NaN68.00.02nd37
7NaN735.0Multiple Body PartsStrain or Injury ByNaNNaNMaleNaNCaliforniaNaNStrain/Injury by JumpingDomesticFull-TimeLOS ANGELEEE JUMPED OFF LADDER, FRACTURED R FOOT AND SPRAINED R ANKLE.LOS ANGELES91502CaliforniaCaliforniaLost TimeFractureNaNInjuryNo Serious Injury Indicated1500Specific Injury2001California Refom 014.00.03.02nd60
8NaNNaNMultiple Body PartsMiscellaneous Causes18.0NaNMaleNaNCaliforniaNaNMisc, OtherDomesticUnknown/OtherSACRAMENTOEE CAME INTO WORK FEELING SINUS PRESSURE & NAUSEAUNKNOWN91355CaliforniaCaliforniaMedical OnlyMultiple Injury, Physical OnlyNaNInjuryNo Serious Injury Indicated0Specific Injury2001California Refom 015.096.00.01st49
9NaN1967.0ChestFall, Slip or Trip InjuryNaNNaNMaleNaNCaliforniaNaNFall/Slip, MiscDomesticUnknown/OtherSACRAMENTOHANDLING FOOD SUBSTANCE, SLIPPED & FELL ON GREENBED AREALOS ANGELES90001CaliforniaCaliforniaMedical OnlyMultiple Injury, Physical OnlyNaNInjuryNo Serious Injury Indicated1630Specific Injury2001California Refom 014.06.00.02nd56

Last rows

Dependentave_wkly_wagebody_partcauseclaimant_ageatty_firm_namegendermarital_statusclaim_stdepart_codedetail_causedomestic_foreignemploy_statushandling_officehow_injury_occurinjury_cityinjury_postalinjury_statejurisdictionlost_time_or_medicalonlynature_injury#dependentsosha_injury_typeseverity_indextime_injurytype_losspolicy_yrreforms_dummylength_employeddiff_carrier_employerdiff_employer_injuryshiftlength_how_injury
2070NaNNaNKneeFall, Slip or Trip Injury40.0NaNMaleNaNGeorgia17.0Fall/Slip, Same LevelDomesticUnknown/OtherWC SOUTHEAEE WAS WALKING AND TRIPPED ALONG AND TRIPPED OVER WOOD AND EFAYETTEVILLE30214GeorgiaGeorgiaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated1500Specific Injury2014NaN1.02.01.02nd60
2071NaNNaNLow Back AreaStrain or Injury By27.0NaNMaleNaNNorth Carolina22.0Strain/Injury by LiftingDomesticUnknown/OtherWC SOUTHEAEMPLOYEE WAS LIFTING HEAVY OBJECTS AND SWITCHED TO A LIGHTERWILMINGTON28411North CarolinaNorth CarolinaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated1300Specific Injury2014NaN1.01.00.02nd60
2072NaNNaNThighBurn or Scald - Heat or Cold Exposure32.0NaNFemaleNaNGeorgia21.0Burn/Exposure, Steam/FluidDomesticUnknown/OtherWC SOUTHEACLEANING UP KITCHEN AFTER LUNCH, SPILLED BOILING HOT WATER OATLANTA30336GeorgiaGeorgiaMedical OnlyBurnNaNInjuryNo Serious Injury Indicated0Specific Injury2014NaN1.00.00.01st60
2073NaN41.0AnkleStrain or Injury By40.0NaNMaleNaNGeorgia17.0Strain/Injury by MiscDomesticUnknown/OtherWC SOUTHEAEMPLOYEE WAS ROOFING SHINGLES, WHEN HE TWISTED HIS RIGHT ANKSENOIANaNGeorgiaGeorgiaMedical OnlySprainNaNInjuryNo Serious Injury Indicated1145Specific Injury2014NaN1.01.00.02nd60
2074NaNNaNLow Back AreaFall, Slip or Trip Injury50.0NaNMaleNaNGeorgia18.0Fall/Slip, Same LevelDomesticUnknown/OtherWC SOUTHEAEMPLOYEE HAS A POWERED MONITOR ATTACHED TO A CABLE. WHEN THEATLANTA30315GeorgiaGeorgiaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2014NaN1.03.00.01st60
2075NaNNaNOther Facial Soft TissueFall, Slip or Trip Injury48.0NaNFemaleNaNGeorgia17.0Fall/Slip, MiscDomesticUnknown/OtherWC SOUTHEAMEDIC WAS CALLED TO THE MILL FOR PAINTER WHO HAD FALLEN. PATFAYETTEVILLE30214GeorgiaGeorgiaMedical OnlyContusionNaNInjuryNo Serious Injury Indicated945Specific Injury2014NaN1.00.00.02nd60
2076NaNNaNFootStruck or Injured By25.0NaNFemaleNaNGeorgia14.0Struck by Object Lifted/HandledDomesticUnknown/OtherWC SOUTHEAWHILE PUSHING CARTS ONE ROLLED OVER HER FOOT. ANDREA FLADER,WILLIAMSON30292GeorgiaGeorgiaMedical OnlySprainNaNInjuryNo Serious Injury Indicated2130Specific Injury2014NaN1.00.01.03rd60
2077NaNNaNKneeStrain or Injury By41.0NaNMaleNaNGeorgia17.0Strain/Injury by MiscDomesticUnknown/OtherWC SOUTHEAEMPLOYEE WAS WALKING ON LOOSE GRAVEL WHEN HE TWISTED HIS RIGATLANTA30316GeorgiaGeorgiaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated1000Specific Injury2014NaN1.00.08.02nd60
2078NaNNaNToe(s)Struck or Injured By35.0NaNMaleNaNMaryland17.0Struck by Object Lifted/HandledDomesticUnknown/OtherWC SOUTHEAMOVING WALL AND THE WALL CAME DOWN CRUSHING EE TOE. PRODUCTICOLUMBIA21046MarylandMarylandMedical OnlyCrushingNaNInjuryNo Serious Injury Indicated1420Specific Injury2014NaN1.00.00.02nd60
2079NaNNaNKneeFall, Slip or Trip Injury47.0NaNMaleNaNVirginia18.0Fall/Slip, Same LevelDomesticUnknown/OtherWC SOUTHEAEE WAS PUSHING SOUND CART IN WOODS WHEN HE TRIPPED ON A TREERICHMOND23226VirginiaVirginiaNaNSprainNaNInjuryNo Serious Injury Indicated1700Specific Injury2014NaN1.03.00.02nd60